Skip to content

learn: prototype sandlock learn subcommand#113

Open
ghazariann wants to merge 10 commits into
multikernel:mainfrom
ghazariann:feat/learn
Open

learn: prototype sandlock learn subcommand#113
ghazariann wants to merge 10 commits into
multikernel:mainfrom
ghazariann:feat/learn

Conversation

@ghazariann

Copy link
Copy Markdown
Contributor

Prototype implementation of sandlock learn (#72).

Summary

Adds sandlock learn -o profile.toml -- <cmd> which runs a workload under observation
and writes a sandlock profile TOML directly usable by sandlock run -p profile.toml.

Domain Implementation
Filesystem reads on_file_access audit hook in supervisor on openat / open / execve / execveat
Filesystem writes same hook; classified by open flags (O_WRONLY / O_RDWR / O_CREAT)
Network egress (TCP / UDP) on_net_connect audit hook in supervisor on connect / sendto / sendmsg
HTTP method + host + path not yet
Syscalls not yet
Resource peaks not yet

Implementation

Runs the workload under fully-permissive Landlock and intercepts syscalls via two
audit hooks added to the sandlock-core supervisor, called before dispatch on every
notification:

  • on_file_access(path, flags)openat / open / execve / execveat
  • on_net_connect(ip, port)connect / sendto / sendmsg

Results are collected and serialized to a ProfileInput TOML.

Discussion points

1. Capturing the executed binary

For sandlock learn -- cat /etc/hostname, /usr/bin/cat must appear in the profile so sandlock run can allow it via Landlock. The binary is loaded through execve, not openat, so the on_file_access hook alone is not enough.

Adding execve to the hook condition in handle_notification was not sufficient on its own. The seccomp BPF filter decides which syscalls generate a SECCOMP_RET_USER_NOTIF, and for a basic sandbox (fs_read, fs_write) execve was not in that list. It only entered notif_syscalls for heavier features (COW, chroot). The notification never reached the supervisor.

The fix: a new audit_file_access feature flag in SandboxFeatures that is true when on_file_access is set. In notif_syscalls_resolved this adds execve/execveat to the BPF notif list. resolve_path_for_notif already handled execve, so no other supervisor logic changed.

Is this the right place to wire this? Should execve have its own separate hook (on_execve) instead of being folded into on_file_access?

2. Resolving the dynamic linker

After execve the kernel maps the dynamic linker (e.g. /lib64/ld-linux-x86-64.so.2) in kernel space before transferring control to userspace, no syscall fires, so it never appears in the on_file_access trace. Without it in the profile, sandlock run fails: Landlock blocks the read of the linker and the process cannot start at all.

The current workaround parses the ELF PT_INTERP segment of the binary (ELF64 only) to recover the interpreter path. This is ad-hoc and not portable (assumes ELF64, specific header offsets, and manual endianness handling).

One idea: the linker appears in /proc/<pid>/maps after execve completes. The supervisor already reads /proc/<pid>/maps for vDSO patching (maybe_patch_vdso), so the pattern exists. But im not sure with the timing cause the execve notification fires before the kernel completes exec.

I'm not that experienced in kernel development and would love guidance on the right approach here.

3. Achieving permissive Landlock during observation

The spec calls for permissive Landlock + seccomp-notify during observation. For reads this is straightforward: .fs_read("/"). For writes, .fs_write("/") is not an option: the observation run must be non-destructive, leaving no trace on the real filesystem.

The current approach pairs .fs_read("/") with .workdir(tempdir) (COW overlay): writes are granted everywhere but redirected to a temporary overlay, so the real filesystem is never touched. The on_file_access hook fires before the COW redirect, so it always sees the original requested path, which is what ends up in the profile.

Is COW the right way to achieve a fully permissive observation environment? Is there a lighter approach?

What still needs to be done

  • Path collapsing
  • Merging and iteration
  • HTTP method + host + path (http.allow)
  • Syscall counting (--learn-syscalls)
  • Resource peaks via /proc sampling

Tests

  • test_learn_captures_fs_read — runs cat /etc/hostname, checks /etc/hostname appears under read
  • test_learn_then_run — full round-trip: learn generates profile from cat /etc/hostname, sandlock run uses it
  • test_learn_captures_fs_write — writes to a pre-existing NamedTempFile (file exists before learn runs), checks path appears under write
  • test_learn_new_file_collapses_to_parent — writes to a file that does not exist; checks the profile records the parent directory, and the real file is never created (COW isolation)
  • test_learn_then_run_write — round-trip for writes: learn captures a write, run actually creates the file
  • test_learn_captures_net_connect — binds a real TcpListener, runs a Python connect, checks the address appears under [network] allow
  • test_learn_then_run_network — round-trip for network: single listener accepts two connections, one from learn and one from run

Test plan

  • cargo test -p sandlock-cli test_learn
  • sandlock learn -o /tmp/profile.toml -- <cmd>
  • sandlock run -p /tmp/profile.toml -- <cmd>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant